Midterm Assignment

과목: 실험통계학 - 중간고사 대체과제
담당교수: 구준모 교수님
학과: 기계공학과
학번: 2016100724
제출일: 2021.05.05. (WED)
이름: 박수빈

1. 공기청정기 가동여부에 따라 I/O Ratio의 평균으로 고려하는 경우

(a) 데이터 확인

  • 필요한 라이브러리 호출
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats
import plotly.express as px
import plotly.io as pio
from plotly import graph_objs as go
import plotly.figure_factory as ff
import chart_studio.plotly as py
import random
import statsmodels.formula.api as smf
from scipy.optimize import minimize
  • 미세먼지 데이터 불러온 후, 데이터 확인
# 미세먼지 데이터 가져오기
PM_measured = pd.read_csv("office_data.csv")
PM_measured = PM_measured.dropna() # 결측치(NA) 제거
print(PM_measured)
##          dayYear  PM10_indoor  PM2.5_indoor  PM10_outdoor  PM2.5_outdoor  PURI
## 0     729.001389          6.0           6.0         54.20          45.10     0
## 1     729.004861          6.1           6.1         53.50          44.20     0
## 2     729.008333          6.1           6.1         52.80          44.00     0
## 3     729.011806          6.0           6.0         53.70          43.80     0
## 4     729.015278          6.0           6.0         53.20          43.90     0
## ...          ...          ...           ...           ...            ...   ...
## 6235  750.700694          1.2           1.1         14.42           6.94     1
## 6236  750.704167          1.2           1.1         15.00           7.08     1
## 6237  750.707639          1.2           1.2         14.72           7.08     1
## 6238  750.711111          1.1           1.1         14.70           7.00     1
## 6239  750.714583          1.1           1.0         15.50           7.06     1
## 
## [6240 rows x 6 columns]
# 데이터 개형 확인하기
pd.options.plotting.backend = "plotly"
pio.templates.default = "presentation"
PM_fig = px.line(PM_measured, x = "dayYear", y = ["PM10_outdoor", "PM10_indoor", "PM2.5_outdoor", "PM2.5_indoor"], line_group = "PURI", title = "Historical PM Data", labels={"dayYear": "Day", "value": "Concentration (ug/m3)"})
PM_fig = PM_fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 18,
    title_font_family="Times New Roman",
    title_font_color="black",
    title_font_size = 24,
    legend_title_font_color="black"
)
PM_fig.write_html("Fig1.html")

Show PM_fig

# 데이터 박스플롯 확인하기
pio.templates.default = "presentation"
fig = px.box(PM_measured, y = ["PM10_outdoor", "PM2.5_outdoor", "PM10_indoor", "PM2.5_indoor"], color = "PURI", labels={"variable": "Size", "value": "Concentration (ug/m3)"})
fig = fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 18,
    title_font_family="Times New Roman",
    title_font_color="black",
    title_font_size = 24,
    legend_title_font_color="black"
)
fig.write_html("Fig2.html")

Show fig

(b) 각 평균의 신뢰구간을 고려하여 실내외의 먼지 농도가 통계적으로 유의미하게 차이나는지 분석

  • 공기청정기 가동여부에 따라 그룹화한 후, 평균과 불편분산 확인
PM_grouped = PM_measured.groupby("PURI")
PM_grouped.size()
## PURI
## 0    3867
## 1    2373
## dtype: int64
PM_mean = PM_grouped.mean()
PM_var = PM_grouped.var(ddof = 1)
print(PM_mean); print(PM_var)
##          dayYear  PM10_indoor  PM2.5_indoor  PM10_outdoor  PM2.5_outdoor
## PURI                                                                    
## 0     735.730384     6.349909      6.127592     18.362054      10.284668
## 1     746.586644     2.134471      2.002107     36.415659      19.585485
##         dayYear  PM10_indoor  PM2.5_indoor  PM10_outdoor  PM2.5_outdoor
## PURI                                                                   
## 0     15.101897     5.824322      5.420881    106.820093      61.182525
## 1      5.708430     1.374275      1.112786    527.551883     103.175443
  • I/O Ratio에 대한 데이터로 나타낸 후, 평균과 표준편차 확인
# 데이터 변형
PM_off = PM_measured[PM_measured["PURI"] == 0]
PM_on = PM_measured[PM_measured["PURI"] == 1]
PM_off_io10 = PM_off["PM10_indoor"] / PM_off["PM10_outdoor"]
PM_off_io25 = PM_off["PM2.5_indoor"] / PM_off["PM2.5_outdoor"]
PM_on_io10 = PM_on["PM10_indoor"] / PM_on["PM10_outdoor"]
PM_on_io25 = PM_on["PM2.5_indoor"] / PM_on["PM2.5_outdoor"]
PM_off_ratio = pd.DataFrame({"PM10": PM_off_io10, "PM2.5": PM_off_io25})
PM_on_ratio = pd.DataFrame({"PM10": PM_on_io10, "PM2.5": PM_on_io25})
# I/O Ratio의 평균
PM_off_mean = PM_off_ratio.mean()
PM_on_mean = PM_on_ratio.mean()
print(PM_off_mean); print(PM_on_mean)
## PM10     0.412098
## PM2.5    0.740559
## dtype: float64
## PM10     0.068713
## PM2.5    0.111222
## dtype: float64
# I/O Ratio의 표준편차
PM_off_std = PM_off_ratio.std(ddof = 1)
PM_on_std = PM_on_ratio.std(ddof = 1)
print(PM_off_std); print(PM_on_std)
## PM10     0.170310
## PM2.5    0.288033
## dtype: float64
## PM10     0.033774
## PM2.5    0.044983
## dtype: float64
  • 평균의 신뢰구간 고려: t-검정 이용 (유의수준 = 0.05)

    • 귀무가설: I/O Ratio의 평균이 1이다. (실내외 먼지 농도가 통계적으로 유의미하게 차이나지 않는다)

    • 대립가설: I/O Ratio의 평균이 1이 아니다. (실내외 먼지 농도가 통계적으로 유의미한 차이를 보인다)

# I/O Ratio 데이터의 길이와 표준오차 구하기
size = np.array([len(PM_off_ratio), len(PM_on_ratio)])
df = size - 1
PM_off_se = PM_off_std / np.sqrt(size[0])
PM_on_se = PM_on_std / np.sqrt(size[1])
# 공기청정기 가동 여부, 입자 크기에 따라 구분하여 t-test
PM_off_t_10 = stats.ttest_1samp(PM_off_ratio["PM10"], 1)
PM_off_t_25 = stats.ttest_1samp(PM_off_ratio["PM2.5"], 1)
PM_on_t_10 = stats.ttest_1samp(PM_on_ratio["PM10"], 1)
PM_on_t_25 = stats.ttest_1samp(PM_on_ratio["PM2.5"], 1)
matrix_tsample = [
    ['', 't-value', 'p-value'],
    ['PM10_off', PM_off_t_10[0], PM_off_t_10[1]],
    ['PM2.5_off', PM_off_t_25[0], PM_off_t_10[1]],
    ['PM10_on', PM_on_t_10[0], PM_on_t_10[1]],
    ['PM2.5_on', PM_on_t_25[0], PM_on_t_10[1]]
]
PM_t_table = ff.create_table(matrix_tsample, index=True)
PM_t_table.write_html("Table1.html")

Show PM_t_table

# 평균의 t-분포에 대한 95% 신뢰구간과 t-값 비교
confint1 = stats.t.interval(alpha = 0.95, df = df[0], loc = PM_off_mean[0], scale = PM_off_se[0])
confint2 = stats.t.interval(alpha = 0.95, df = df[0], loc = PM_off_mean[1], scale = PM_off_se[1])
confint3 = stats.t.interval(alpha = 0.95, df = df[1], loc = PM_on_mean[0], scale = PM_on_se[0])
confint4 = stats.t.interval(alpha = 0.95, df = df[1], loc = PM_on_mean[1], scale = PM_on_se[1])
matrix_conf = [
    ['Confint Table', 't-value', "Lower", "Upper"],
    ['PM10_off', PM_off_t_10[0], confint1[0], confint1[1]],
    ['PM2.5_off', PM_off_t_25[0], confint2[0], confint2[1]],
    ['PM10_on', PM_on_t_10[0], confint3[0], confint3[1]],
    ['PM2.5_off', PM_on_t_25[0], confint4[0], confint4[1]]
]
conf_table = ff.create_table(matrix_conf, index = True)
conf_table.write_html("Table2.html")

Show conf_table

(c) 두 비의 차이를 이용해 공기청정기 효과를 통계적으로 분석

  • 점추정 필요: 두 비의 평균값의 차이와 표준오차 확인
# PM10 평균 비교
PM_off_mean["PM10"] - PM_on_mean["PM10"]
## 0.3433858682283977
# PM10 표준오차
print(PM_off_se["PM10"]); print(PM_on_se["PM10"])
## 0.002738753954937508
## 0.0006933256088557901
# PM2.5 평균 비교
PM_off_mean["PM2.5"] - PM_on_mean["PM2.5"]
## 0.6293367572119647
# PM2.5 표준오차
print(PM_off_se["PM2.5"]); print(PM_on_se["PM2.5"])
## 0.004631861475851101
## 0.0009234151662967842

2. 주어진 시간에서의 I/O Ratio를 구하고 공기청정기 가동여부에 따라 비교

  • 시간에 따른 I/O Ratio로 데이터프레임 생성 후 확인
# Dataframe 생성
PM_ratio = pd.DataFrame({"PM10":PM_measured["PM10_indoor"] / PM_measured["PM10_outdoor"], "PM2.5": PM_measured["PM2.5_indoor"] / PM_measured["PM2.5_outdoor"], "PURI": PM_measured["PURI"]})
PM_ratio["dayYear"] = PM_measured["dayYear"]
# 입자 크기별로 나누어 I/O Ratio를 나타낸 그래프
PM_fig2 = px.line(PM_ratio, x = "dayYear", y = ["PM10", "PM2.5"], line_group = "PURI", title = "Historical PM Ratio", labels={"dayYear": "Day", "value": "I/O Ratio (Cin/Cout)", "variable": "Size"})
PM_fig2 = PM_fig2.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 18,
    title_font_family="Times New Roman",
    title_font_color="black",
    title_font_size = 24,
    legend_title_font_color="black"
)
PM_fig2.write_html("Fig3.html")

Show PM_fig2

# Box plot 확인
fig2 = px.box(PM_ratio, y = ["PM10", "PM2.5"], color = "PURI", labels={"variable": "Size", "value": "I/O Ratio (Cin/Cout)"})
fig2 = fig2.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 18,
    title_font_family="Times New Roman",
    title_font_color="black",
    title_font_size = 24,
    legend_title_font_color="black"
)
fig2.write_html("Fig4.html")

Show fig2

  • 두 비의 차로 나타내어 분석: t-검정 - 단측검정 (유의수준 = 0.05)

    • 기준: I/O Ratio의 차이 vs “0”

    • 귀무가설: 두 비의 차이가 같다.

    • 대립가설: 두 비의 차이가 0보다 크다 or 작다.

# 두 데이터의 길이가 다름 -> 샘플링 필요 (무작위로 수행)
PM_ratio_off = PM_ratio[PM_ratio["PURI"] == 0]
PM_ratio_on = PM_ratio[PM_ratio["PURI"] == 1]
PM_ratio_off2 = PM_ratio_off.sample(len(PM_ratio_on), replace = False)

# index 초기화
PM_ratio_off2 = PM_ratio_off2.reset_index()
PM_ratio_on = PM_ratio_on.reset_index()
# 공기 청정기를 껐을 때의 I/O Ratio - 켰을 때의 I/O Ratio
PM_ratio_sub = PM_ratio_off2 - PM_ratio_on
PM_ratio_size = len(PM_ratio_sub)
PM_ratio_df = PM_ratio_size - 1
PM_ratio_mean = PM_ratio_sub.mean()
PM_ratio_se = PM_ratio_sub.std(ddof = 1) / np.sqrt(PM_ratio_size)
print(PM_ratio_mean); print(PM_ratio_se)
## index     -3125.348925
## PM10          0.339883
## PM2.5         0.626490
## PURI         -1.000000
## dayYear     -10.874826
## dtype: float64
## index      26.735553
## PM10        0.003533
## PM2.5       0.006056
## PURI        0.000000
## dayYear     0.093105
## dtype: float64
# 95% 유의수준에서 신뢰구간 형성
confint_ratio = stats.t.interval(alpha = 0.95, df = PM_ratio_df, loc = PM_ratio_mean[1], scale = PM_ratio_se[1])
confint_ratio2 = stats.t.interval(alpha = 0.95, df = PM_ratio_df, loc = PM_ratio_mean[2], scale = PM_ratio_se[2])

# 두 비의 차이와 0을 비교하는 t-test
PM_ratio_ttest10 = stats.ttest_1samp(PM_ratio_sub["PM10"], 0)
PM_ratio_ttest25 = stats.ttest_1samp(PM_ratio_sub["PM2.5"], 0)
matrix_conf2 = [
    ['Confint Table', 't-value', "Lower", "Upper"],
    ['PM10', PM_ratio_ttest10[0], confint_ratio[0], confint_ratio[1]],
    ['PM2.5', PM_ratio_ttest25[0], confint_ratio2[0], confint_ratio2[1]]
]

# 결과 확인
conf_table2 = ff.create_table(matrix_conf2, index = True)
conf_table2.write_html("Table3.html")

Show conf_table2

3. 회귀분석을 이용해서 외부 (초)미세먼지 농도에 따라 실내 미세먼지 농도가 어떻게 될 지를 통계적으로 분석

  • 모델: \(\frac{dC_{in}}{dt} = ACH \cdot P \cdot C_{out} (t) - (ACH + k) \cdot C_{in} (t) + \dot{S}\)

  • 재실인원, 문 개폐 여부에 대한 정보가 주어지지 않음

  • 따라서 하루 단위로 나누어 입자 입경별로, 그리고 공기청정기 가동 여부에 따라 분석 진행

# 데이터 불러오기
data <- read.csv("office_data.csv")
data <- data[complete.cases(data),]
  • ODE: Trapezoidal Method로 풀이

    • \(C_{in,n+1}=C_{in,n} + \frac{h}{2}\cdot \left (f(t_n, C_{in,n}) + f(t_{n+1}, C_{in,n+1}) \right )\)
# Cin을 구하는 함수 만들기
pred_Cin <- function(parameters, t, Cin, Cout){
    ACH_P <- parameters[[1]]
    ACH_k <- parameters[[2]]
    S <- parameters[[3]]
    pred <- Cin[1]
    for (i in seq(1, length(t) - 1)){
        h <- t[i+1] - t[i]
        pred[i+1] <- (pred[i] + h / 2 * (ACH_P * (Cout[i] + Cout[i+1]) - ACH_k * pred[i] + 2 * S)) / (1 + h / 2 * ACH_k)
    }
    return (data.frame(dayYear = t, PM_measured = Cin, PM_predicted = pred))
}
# 최소제곱법을 활용하기 위해 모델 예측값과 실측값 사이의 잔차를 구하는 함수 만들기
resid <- function(parameters, t, Cin, Cout){
    temp <- pred_Cin(parameters, t, Cin, Cout)
    return (sum((temp$PM_measured - temp$PM_predicted)^2))
}
# 공기청정기 가동 여부에 따라 데이터를 구분하고, 하루단위로 모드 설정
data_off <- data[which(data$PURI == 0),]
data_on <- data[which(data$PURI == 1),]
data_off$mode <- 0
for (i in seq(14, 1)){
    data_off[which(data_off$dayYear <= 729 + i),]$mode <- i
}
data_on$mode <- 0
for (i in seq(9, 1)){
    data_on[which(data_on$dayYear <= 742 + i),]$mode <- i
}
  • scipy.optimize.minimize 대신, R의 DEoptim 사용
# Optimization의 Bound 설정
library(DEoptim)
Lp <- c(0, 0, 0)
Up <- c(100, 100, 100)
# PM10, PURI = 0
param1 <- NULL
temp <- NULL
for (i in seq(1, 14)){
    temp <- data_off[which(data_off$mode == i),]
    opt <- DEoptim(resid, Lp, Up, control = DEoptim.control(trace = FALSE, NP = 100, itermax = 500, parallelType = 1, parVar = c("resid", "pred_Cin")), t = temp$dayYear, Cin = temp$PM10_indoor, Cout = temp$PM10_outdoor)
    param1 <- rbind(param1, unlist((opt$optim$bestmem)))
}
param1 <- as.data.frame(param1)
model_data1 <- NULL
for (i in seq(1, 14)){
    model_data1 <- rbind(model_data1, pred_Cin(param1[i,], data_off$dayYear[which(data_off$mode == i)], data_off$PM10_indoor[which(data_off$mode == i)], data_off$PM10_outdoor[which(data_off$mode == i)]))
}
# PM2.5, PURI = 0
param2 <- NULL
temp <- NULL
for (i in seq(1, 14)){
    temp <- data_off[which(data_off$mode == i),]
    opt <- DEoptim(resid, Lp, Up, control = DEoptim.control(trace = FALSE, NP = 100, itermax = 500, parallelType = 1, parVar = c("resid", "pred_Cin")), t = temp$dayYear, Cin = temp$PM2.5_indoor, Cout = temp$PM2.5_outdoor)
    param2 <- rbind(param2, unlist((opt$optim$bestmem)))
}
param2 <- as.data.frame(param2)
model_data2 <- NULL
for (i in seq(1, 14)){
    model_data2 <- rbind(model_data2, pred_Cin(param2[i,], data_off$dayYear[which(data_off$mode == i)], data_off$PM2.5_indoor[which(data_off$mode == i)], data_off$PM2.5_outdoor[which(data_off$mode == i)]))
}
# PM10, PURI = 1
param3 <- NULL
temp <- NULL
for (i in seq(1, 9)){
    temp <- data_on[which(data_on$mode == i),]
    opt <- DEoptim(resid, Lp, Up, control = DEoptim.control(trace = FALSE, NP = 100, itermax = 500, parallelType = 1, parVar = c("resid", "pred_Cin")), t = temp$dayYear, Cin = temp$PM10_indoor, Cout = temp$PM10_outdoor)
    param3 <- rbind(param3, unlist((opt$optim$bestmem)))
}
param3 <- as.data.frame(param3)
model_data3 <- NULL
for (i in seq(1, 9)){
    model_data3 <- rbind(model_data3, pred_Cin(param3[i,], data_on$dayYear[which(data_on$mode == i)], data_on$PM10_indoor[which(data_on$mode == i)], data_on$PM10_outdoor[which(data_on$mode == i)]))
}
# PM2.5, PURI = 1
param4 <- NULL
temp <- NULL
for (i in seq(1, 9)){
    temp <- data_on[which(data_on$mode == i),]
    opt <- DEoptim(resid, Lp, Up, control = DEoptim.control(trace = FALSE, NP = 100, itermax = 500, parallelType = 1, parVar = c("resid", "pred_Cin")), t = temp$dayYear, Cin = temp$PM2.5_indoor, Cout = temp$PM2.5_outdoor)
    param4 <- rbind(param4, unlist((opt$optim$bestmem)))
}
param4 <- as.data.frame(param4)
model_data4 <- NULL
for (i in seq(1, 9)){
    model_data4 <- rbind(model_data4, pred_Cin(param4[i,], data_on$dayYear[which(data_on$mode == i)], data_on$PM2.5_indoor[which(data_on$mode == i)], data_on$PM2.5_outdoor[which(data_on$mode == i)]))
}
  • 결정된 모델 계수
param <- rbind(param1, param2, param3, param4)
names(param) <- c("ACH_P", "ACH_k", "S")
param
##           ACH_P        ACH_k            S
## 1  9.328329e-17 2.564634e-15 6.699289e+00
## 2  2.179061e-15 9.809872e-01 1.813316e-16
## 3  2.567834e+00 8.179470e+00 4.007757e-15
## 4  6.603540e+00 1.978024e+01 3.406573e-14
## 5  1.011317e+00 4.738561e+00 5.405171e-15
## 6  1.375071e-16 1.102722e+01 6.004932e+01
## 7  3.957095e+00 2.210519e+01 1.000000e+02
## 8  5.788817e+00 1.893160e+01 4.836396e+01
## 9  6.583253e+00 1.194040e+01 9.574926e-15
## 10 2.503989e+00 1.947362e+01 6.861569e+01
## 11 6.772222e+00 2.306236e+01 5.223411e+01
## 12 9.571093e-16 5.273472e+00 3.170811e+01
## 13 2.246358e-01 1.536079e+01 1.000000e+02
## 14 1.248887e-14 6.833919e+00 6.324004e+01
## 15 3.632864e-16 7.957497e-16 5.729020e+00
## 16 1.274366e+00 2.347041e+00 2.040001e-14
## 17 1.348886e+01 3.070851e+01 4.558789e+01
## 18 2.364825e+00 4.916577e+00 1.841579e-14
## 19 2.963524e+00 5.344704e+00 6.342729e-16
## 20 7.129675e+00 1.393974e+01 3.798708e-12
## 21 3.001183e+00 4.344621e+00 1.202283e-13
## 22 9.811753e+00 2.704086e+01 1.000000e+02
## 23 3.378944e+01 3.608690e+01 1.080628e-13
## 24 5.590697e+00 1.089545e+01 2.199322e+01
## 25 1.118405e+01 1.159817e+01 2.767587e-13
## 26 5.455330e-15 1.012792e+01 5.848473e+01
## 27 8.791260e-01 1.624850e+01 1.000000e+02
## 28 4.681856e-16 7.061785e+00 6.496993e+01
## 29 6.864763e+00 1.000000e+02 4.201423e+01
## 30 1.241156e+00 1.453465e+01 1.164515e-14
## 31 7.350835e-17 3.736645e+01 1.000000e+02
## 32 1.190448e+00 8.244459e+01 1.000000e+02
## 33 4.229658e-02 1.463209e-13 5.144498e-14
## 34 1.045657e+00 1.335381e+01 8.275653e+00
## 35 3.190458e-01 4.589139e+00 2.476456e-15
## 36 1.025227e+00 1.374493e+01 2.473571e-15
## 37 1.410475e+00 2.202637e+01 9.950422e-16
## 38 9.600310e+00 1.000000e+02 1.747207e+01
## 39 1.590195e+00 1.588400e+01 2.970043e-14
## 40 1.053695e-15 3.447842e+01 8.636807e+01
## 41 2.249060e+00 4.037713e+01 2.536631e+01
## 42 1.505802e+00 1.617964e+01 9.093305e-15
## 43 1.551116e+00 1.244401e+01 3.340662e+00
## 44 5.871379e-01 5.933324e+00 6.976404e-15
## 45 1.015290e+00 8.685282e+00 2.216446e-15
## 46 3.438771e+00 2.440936e+01 1.979982e-15
  • 결정한 모델로부터 예측 정확도 평가: 결정계수 (\(r^2\))
getRSQ <- function(Exp, Pred){
    rmd=which(is.na(Pred))
    if(length(rmd)>0){
        Exp=Exp[-rmd]
        Pred=Pred[-rmd]
    }
    St <- sum((Exp-mean(Exp))^2)
    S <- sum((Exp-Pred)^2)
    return((St-S)/St)
}
rsq1 <- getRSQ(model_data1$PM_measured, model_data1$PM_predicted) # PM10, off
rsq2 <- getRSQ(model_data2$PM_measured, model_data2$PM_predicted) # PM2.5, off
rsq3 <- getRSQ(model_data3$PM_measured, model_data3$PM_predicted) # PM10, on
rsq4 <- getRSQ(model_data4$PM_measured, model_data4$PM_predicted) # PM2.5, on
rsq <- data.frame("PM10_off" = rsq1, "PM2.5_off" = rsq2, "PM10_on" = rsq3, "PM2.5_on" = rsq4)
rsq
##    PM10_off PM2.5_off   PM10_on  PM2.5_on
## 1 0.6899945 0.6935907 0.7838796 0.9074512
  • 모델 피팅 결과와 측정값 비교
# 모델 예측값들로 데이터프레임 구성
model_data <- rbind(cbind(model_data1, model_data2), cbind(model_data3, model_data4))
names(model_data) <- c("dayYear", "PM10_measured", "PM10_predicted", "dayYear2", "PM2.5_measured", "PM2.5_predicted")
model_data$PURI <- data$PURI
# 그래프 그리기
library(plotly)
fig <- plot_ly(model_data, x = ~dayYear, y = ~PM10_measured, type = "scatter", mode = "lines", name = "PM10_measured", line = list(width = 3, dash = "dot")) %>% 
    add_trace(y = ~PM10_predicted, name = "PM10_predicted", line = list(width = 3, dash = "dash")) %>% 
    add_trace(y = ~PM2.5_predicted, name = "PM2.5_predicted", line = list(width = 3)) %>% 
    add_trace(y = ~PM2.5_measured, name = "PM2.5_measured", line = list(width = 3, dash = "dash")) %>% 
    layout(title = "Comparison of Measured and Predicted Data",
           xaxis = list(title = "Day", zeroline = TRUE),
           yaxis = list(title = "Concentration (ug/m3)"))
fig